perm filename PATREC[4,KMC]8 blob sn#099757 filedate 1974-05-02 generic text, type T, neo UTF8
00100	
00200	
00300	      PATTERN-MATCHING RULES FOR THE RECOGNITION OF
00400	         NATURAL LANGUAGE DIALOGUE EXPRESSIONS
00500	
00600			Kenneth Mark Colby
00700	                      and
00800		        Roger C. Parkison
00900	
01000	
01100		            INTRODUCTION
01200	
01300		To recognize something is to identify it as  an  instance  of
01400	the "same again".   This familiarity is possible because of recurrent
01500	characteristics of the world which repeat themselves  over  and  over
01600	again.     We  shall describe an algorithm which recognizes recurrent
01700	characteristics of natural language dialogue expressions. It utilizes
01800	a  multi-stage  sequence  of pattern-matching rules for progressively
01900	transforming these input expressions into a pattern which  eventually
02000	best  matches  a more abstract stored pattern. The name of the stored
02100	pattern has a pointer to  the  name  of  a  response  function  which
02200	decides   what   to   do  once  the  input  has  been  characterized.
02300	Here we shall discuss only the recognizing functions, except for  one
02400	response  function  (anaphoric substitution) which interactively aids
02500	the characterization process.  How  the  response  functions  operate
02600	will be described in a future communication.
02700		In  constructing  and  testing  a  simulation   of   paranoid
02800	processes,  we  were  faced  with the problem of reproducing paranoid
02900	linguistic behavior in a diagnostic  psychiatric  interview.      The
03000	diagnosis   of  paranoid  states,  reactions  or  modes  is  made  by
03100	clinicians who judge a degree of  correspondence  between  what  they
03200	observe  linguistically in an interview and their conceptual model of
03300	paranoid behavior. There exists a  high  degree  of  agreement  among
03400	psychiatrists about this conceptual model which relies mainly on what
03500	an interviewee says and how he says it.
03600		Natural language is a life-expressing  code  people  use  for
03700	communication  with  themselves  and others.  In a real-life dialogue
03800	such as a psychiatric interview,  the  participants  have  interests,
03900	intentions,  and  expectations which are revealed in their linguistic
04000	expressions.    To produce effects on an interviewer which  he  would
04100	judge  similar  to  the  effects  produced by a paranoid patient , an
04200	interactive  simulation  of  a  paranoid  patient  must  be  able  to
04300	demonstrate  typical  paranoid  interview  behavior.  To  achieve the
04400	desired effects, the paranoid model must have  the  ability  to  deal
04500	with the linguistic behavior of the interviewer.
04600		There  are  a  number of approaches one might consider for an
04700	ideal handling of dialogue expressions.       One approach  would  be
04800	to  construct  a  dictionary  of all expressions which could possibly
04900	arise in an interview.  Associated with each expression would be  its
05000	interpretation,  depending  on  dialogue  context, which in turn must
05100	somehow be defined. For obvious reasons, no one takes  this  approach
05200	seriously.       Instead  of  an  expression  dictionary,  one  might
05300	construct a word dictionary and then use projection rules to yield an
05400	interpretation  of a sentence from the dictionary definitions.   This
05500	has been the approach adopted by most  linguistic  parsers  based  on
05600	transformational or systemic  grammars.     Such  a  method  performs
05700	adequately  as  long  as  the  dictionary involves only a few hundred
05800	words, each word having only one or two senses, and the  dialogue  is
05900	limited  to  a mini-world of a few objects and relations.     But the
06000	problems which arise in a real-time psychiatric  interview  conducted
06100	in unrestricted English are too great for this method to be useful in
06200	actual dialogues requiring immediate responses.
06300		There is little consensus knowledge about how humans  process
06400	natural  language.    They  can be shown to possess some knowledge of
06500	grammar rules but this does not entail that they  use  a  grammar  in
06600	interpreting  and  producing  language.     Irregular verb-tenses and
06700	noun-plurals  do  not  follow  grammatical  rules;  yet  people   use
06800	thousands  of  them.  One  school of linguistics believes that people
06900	possess full transformational grammars for processing  language.   In
07000	our  view  this  position  seems  implausible.  Language  is  what is
07100	recognized but the  processes  involved  may  not  be  linguistic  or
07200	grammatical.   Originally transformational grammars were not designed
07300	to "understand" a large subset of English; they represented a set  of
07400	axioms for deciding whether a string is grammatical.   Efforts to use
07500	them for other purposes have not been very fruitful.
07600		An analysis of what one's problem actually  is  should  guide
07700	the  selection  or invention of methods appropriate to that problem's
07800	solution.  Our problem was not to develop a  consistent  and  general
07900	theory  of  language  nor  to  assert empirically testable hypotheses
08000	about how people process language.   Our problem  was  to  design  an
08100	algorithm  which recognizes what is being said in a dialogue and what
08200	is being said about it in order to make a response such that a sample
08300	of I-O pairs from the paranoid model is judged similar to a sample of
08400	I-O pairs  from  paranoid  patients.       From  the  perspective  of
08500	artificial  intelligence  as  an  attempt to get computers to perform
08600	human intellectual tasks, new methods had to be devised for the  task
08700	of  participating in a human dialogue in a paranoid-patient-like way.
08800	We are not making an existence claim that our methods  represent  the
08900	way  people  process  language.     We sought effective methods which
09000	could operate efficiently in real time. Likewise, we would  not  deny
09100	that our methods for this skillful task are possibly analogous to the
09200	methods humans use.   And since our methods provide a general way  of
09300	many-to-one  mapping  from  surface  expressions  to a single stored
09400	pattern, these methods can be used by any type of "host" system which
09500	takes natural language as input.
09600		To  perform  the  task  of  managing  communicative  uses and
09700	effects of natural language, we developed a strategy of  transforming
09800	the  input  until  a  pattern is achieved which matches completely or
09900	partially a more abstract stored pattern.  This strategy  has  proved
10000	adequate for our purposes a satisfactory percentage of the time.  (No
10100	one  expects  any  method to be successful 100% of the time since not
10200	even humans, the best natural  language  processors  around,  achieve
10300	this  level  of  performance).   The power of this method for natural
10400	language dialogues lies  in  its  ability  to  ignore  unrecognizable
10500	expressions  and  irrelevant  details.    A conventional parser doing
10600	word-by-word, parts-of-speech analysis fails when it cannot find  one
10700	or  more  of  the input words in its dictionary. Because it must know
10800	every word, it is too fragile for unrestricted dialogues.
10900		In  early versions of the paranoid model, (PARRY1), many (but
11000	not all) of the pattern recognition mechanisms allowed  the  elements
11100	of  the  pattern  to  be  order independent. (Colby, Weber, and Hilf,
11200	1971).  For example, consider the following expressions:
11300		(1) WHERE DO YOU WORK?
11400		(2) WHAT SORT OF WORK DO YOU DO? 
11500		(3) WHAT IS YOUR OCCUPATION? 
11600		(4) WHAT DO YOU DO FOR A LIVING? 
11700		(5) WHERE ARE YOU EMPLOYED? 
11800	In PARRY1 a procedure would scan these  expressions  looking  for  an
11900	information-bearing contentive such as "work", "for a  living",  etc.
12000	If  it  found  such  a contentive along with a "you" or "your" in the
12100	expression, regardless  of  word  order,  it  would  respond  to  the
12200	expression  as  if it were a question about the nature of one's work.
12300	(There  is  some  doubt  this  even  qualifies  as  a  pattern  since
12400	interrelations  between  words are ignored and only their presence is
12500	considered).    An insensitivity to word order has the advantage that
12600	lexical items representing different parts of  speech  can  represent
12700	the same concept,e.g.  "work" as noun or as verb. But a price is paid
12800	for this resilience and elasticity. We found  from  experience  that,
12900	since  English  relies heavily on word order to convey the meaning of
13000	it   messages,   the  average  penalty  of  misunderstanding  (to  be
13100	distinguished from ununderdstanding), was too great. Hence in PARRY2,
13200	as  will  be described shortly, the patterns require a specified word
13300	order.
13400		It  seems  agreed  that  for  high-complexity  problems it is
13500	useful to have constraints.       Diagnostic  psychiatric  interviews
13600	(and  especially those conducted over teletypes) have several natural
13700	constraints.  First, clinicians are trained to ask certain  questions
13800	in  certain  ways. These stereotypes can be treated as special cases.
13900	Second, only  a  few  hundred  standard  topics  are  brought  up  by
14000	interviewers  who  are  trained  to  use  everyday  expressions   and
14100	especially  those  used by the patient himself. When the interview is
14200	conducted over teletypes, expressions tend to be shortened since  the
14300	interviewer  tries to increase the information transmission rate over
14400	the slow channel of a teletype.  (It is said that  short  expressions
14500	are  more  grammatical  but  think  about  the phrase "Now now, there
14600	there.") Finally,  teletyped  interviews  represent  written  speech.
14700	Speech is known to be highly redundant; hence many unrecognized words
14800	can be ignored without losing the meaning of  the  message.   Written
14900	speech  is  loaded with idioms, cliches, pat phrases, etc.      - all
15000	being easy prey for a pattern-matching approach.   It is time-wasting
15100	and  usually  futile  to  try  to  decode  an  idiom by analyzing the
15200	meanings of its individual words.
15300		We  shall  now describe the pattern-matching functions of the
15400	algorithm in some detail.
15500	
15600	 		    PREPROCESSING
15700	
15800		Each word in the input expression is first  looked  up  in  a
15900	dictionary  of 1900 (currently) entries which, for the sake of speed,
16000	is maintained in core during run-time.   The  dictionary,  which  was
16100	built   empirically  from  thousands  of  teletyped  interviews  with
16200	previous versions of the model, consists of words, groups  of  words,
16300	and  names  of word-classes they can be translated into. (See Fig.  1
16400	for illustrative examples of dictionary entries).  The  size  of  the
16500	dictionary  is  determined by the patterns, i.e. each dictionary word
16600	appears in one or  more  patterns.       Entries  in  the  dictionary
16700	reflect PARRY2's main interests. If a word in the input is not in the
16800	dictionary, it is dropped from the pattern being formed. Thus if  the
16900	input were:
17000		WHAT IS YOUR CURRENT OCCUPATION?
17100	and  the  word  "current"  were not in the dictionary, the pattern at
17200	this stage becomes:
17300		( WHAT IS YOUR OCCUPATION )   
17400	The  question-mark  is  thrown  away as redundant since questions are
17500	recognized by word order. (A statement followed by  a  question  mark
17600	(YOU  GAMBLE?)  is considered to be communicatively equivalent in its
17700	effects  to  that  statement  followed  by   a   period.)   Synonymic
17800	translations  of  words  are  made  so  that the pattern becomes, for
17900	example:
18000		( WHAT  BE  YOU  JOB )   
18100	Groups  of  words  are  translated  into a single word-class name, so
18200	that, for example, "for a living" becomes "job".
18300		Misspellings   are   handled   in  two  ways.  First,  common
18400	misspellings of important words are simply  put  in  the  dictionary.
18500	Thus  "yoy"  is  known to mean "you". The apostrophe is often omitted
18600	from contractions so most contractions are recognized with or without
18700	it. These common misspellings were gathered from over 4000 interviews
18800	with earlier versions of the paranoid model.
18900		Second,  there  are  5 common forms of typing error which are
19000	checked systematically.  These are:
19100		1) Doubled letter
19200		2) Extraneous letter
19300		3) Forgetting to hold the "shift key" for an apostrophe
19400		4) Hitting a nearby key on the keyboard
19500		5) Transposing two letters in a word
19600	The first three errors can be corrected  by  deleting  the  offending
19700	character  from  the  word.  The fourth type of error is only checked
19800	for 10 of the more common near misses.    These were also empirically
19900	determined  and include letter pairs like (T Y), (O P), (A S), and (N
20000	M).   These methods are all based on typing  errors,  but  they  also
20100	correct   some   legitimate   English   spelling  errors.  Two-letter
20200	transposition corrects, for example, "beleive" to "believe".
20300		Certain juxtaposed words are contracted into a  single  word,
20400	e.g.    "GET ALONG WITH" becomes "GETALONGWITH". This is done to deal
20500	with groups of words which are represented as a single element in the
20600	stored pattern, thereby preventing segmentation from occurring at the
20700	wrong places, such as at a preposition inside an idiom. Besides these
20800	contractions,  certain  expansions  are  made  so  that  for example,
20900	"DON'T" becomes "DO NOT" and "I'D" becomes "I WOULD".
21000	
21100	                      SEGMENTING
21200	
21300		Another weakness in the crude pattern matching of PARRY1  was
21400	that  it  took  the  entire  input expression as its basic processing
21500	unit. Hence if only two words were recognized in an eight word input,
21600	the  risk  of  misunderstanding was great. We needed a way of dealing
21700	with units shorter than the entire input expression.
21800		Expert  telegraphists  stay  six  to  twelve  words  behind a
21900	received message before transcribing it.  Translators wait until they
22000	have  heard  4-6  words  before  they  begin translating.  Aided by a
22100	heuristic from work in machine-translation (Wilks, 1973 ), we devised
22200	a  way  of  bracketing  the pattern constructed up to this point into
22300	shorter segments using prepositions, wh-forms, certain verbs, etc. as
22400	bracketing  points.  (A  list of the bracketing terms appears in Fig.
22500	2). The new pattern formed  is  termed  either  "simple",  having  no
22600	delimiters  within  it,  or "compound", i.e., being made up of two or
22700	more simple patterns. A simple pattern might be:
22800		( WHAT BE YOU JOB )
22900	whereas a compound pattern would be:
23000		(( WHY BE YOU ) ( IN HOSPITAL ))
23100	Our experience with this method of segmentation shows  that  compound
23200	patterns  from teletyped psychiatric dialogues rarely consist of more
23300	than three or four segments. 
23400		After certain verbs (See  Fig.  3)  a  bracketing  occurs  to
23500	replace the commonly omitted "THAT", such that:
23600		( I THINK YOU BE AFRAID )
23700	becomes
23800		(( I THINK ) ( YOU BE AFRAID ))
23900	
24000			   PREPARATION FOR MATCHING
24100	
24200		Conjunctions serve only as markers for the segmenter and they
24300	are dropped out after segmentation.
24400		Negations are  handled  by  extracting  the  "NOT"  from  the
24500	pattern and assigning a value to a global variable which indicates to
24600	the algorithm that the expression is negative in form. When a pattern
24700	is  finally matched, this variable is consulted. Some patterns have a
24800	pointer to a pattern of opposite meaning if  a  "NOT"  could  reverse
24900	their  meanings.     If this pointer is present and a "NOT" is found,
25000	then the pattern matched is replaced by its opposite, e.g.  (  I  not
25100	trust  you  )  is replaced by the pattern ( I mistrust you ). We have
25200	not yet observed the troublesome case of "he gave me not one but  two
25300	messages". (There is no need to scratch where it doesn't itch).
25400	
25500			   MATCHING AND RECYCLING
25600	
25700		The  algorithm  now  attempts to match the segmented patterns
25800	with stored patterns which currently number about  2200,  1700  being
25900	simple  and 500 being compound. First a complete and perfect match is
26000	sought.  When a match is found, the stored pattern name has a pointer
26100	to  the name of a response function which decides what to do further.
26200	If a match is not found, further transformations of the  pattern  are
26300	carried out and a "fuzzy" match is tried.
26400		For  fuzzy  matching  at this stage, we adopted the heuristic
26500	rule of dropping elements in the pattern one at a time and attempting
26600	a match each time.   This heuristic allows ignoring familiar words in
26700	unfamiliar contexts.   For example, "well" is important in  "Are  you
26800	well?" but meaningless in "Well are you?".
26900	 	Deleting one element at a time results  in,  for example, the
27000	pattern:
27100			( WHAT BE YOU MAIN PROBLEM )
27200	becoming successively:
27300			(a) ( BE YOU MAIN PROBLEM )
27400			(b) ( WHAT YOU MAIN PROBLEM )
27500			(c) ( WHAT BE MAIN PROBLEM )
27600			(d) ( WHAT BE YOU PROBLEM )
27700			(e) ( WHAT BE YOU MAIN )
27800	Since the stored pattern in this case matches (d), (e) would  not  be
27900	constructed.   We  found  it  unwise  to delete more than one element
28000	since our segmentation method usually yields  segments  containing  a
28100	small (1-4) number of words.
28200		Dropping  an  element  at  a  time  provides  a   probalility
28300	threshold  for  fuzzy   matching which is a function of the length of
28400	the segment. If a segment consists of five elements, four of the five
28500	must be present in a particular order (with the fifth element missing
28600	in any position) for a match to occur. If  a  segment  contains  four
28700	elements, three must match - and so forth.
28800		The  transformations  described above result in a progressive
28900	coarsening of the patterns by deletion. Substitutions are  also  made
29000	in  certain  cases.  Some patterns contain pronouns which could stand
29100	for a number of different things of importance to PARRY2.       As we
29200	mentioned  in  the  introduction,  there  is  one  case  in which the
29300	response functions  of  memory  aid  in  language  recognition.   One
29400	function  of  memory is to keep track of the context in order to give
29500	pronouns and other anaphoras a correct interpretation.  For  example,
29600	the pattern:
29700		( DO YOU AVOID THEM )
29800	could refer to the Mafia, or racetracks, or other patients, depending
29900	on the context.  When such a pattern is recognized,  the  pronoun  is
30000	replaced by its current anaphoric value as determined by the response
30100	functions, and a more specific pattern such as:
30200		( DO YOU AVOID MAFIA )
30300	is  looked  up.  In many cases, the meaning of a pattern containing a
30400	pronoun is clear without any substitution.  In the pattern:
30500		(( HOW DO THEY TREAT YOU ) ( IN HOSPITAL ))
30600	the meaning of THEY is clarified by ( IN HOSPITAL ).
30700	
30800			   COMPOUND-PATTERN MATCH
30900	
31000		When more than one simple pattern is detected in the input, a
31100	second matching is attempted.  The deletion  methods used are similar
31200	to the first matching except they occur at the segment  level  rather
31300	than at the single element level. Certain patterns, such as ( HELLO )
31400	and ( I THINK ), are dropped because they are considered meaningless.
31500	If  a  complete match is not found, then simple patterns are dropped,
31600	one at a time, from the complex pattern. This allows the input,
31700		(( HOW DO YOU COME ) ( TO BE ) ( IN HOSPITAL ))
31800	to  match  the  stored  pattern,
31900		(( HOW  DO  YOU COME ) ( IN HOSPITAL )).
32000	
32100		If  no  match  can  be found at this point, the algorithm has
32200	arrived at a default condition and the appropriate response functions
32300	decide  what  to  do.  For example, in a default condition, the model
32400	may assume  control  of  the  interview,  asking  the  interviewer  a
32500	question, continuing with the topic under discussion or introducing a
32600	new topic.
32700		An  example of interview dialgue is presented in the appendix
32800	showing the transformations the input  expressions  undergo  and  the
32900	patterns they match.
33000	
33100			   ADVANTAGES AND LIMITATIONS
33200	
33300		As   mentioned,   one   of   the   main   advantages   of   a
33400	pattern-matching strategy is that it can ignore as irrelevant what it
33500	does NOT recognize.     There are several million words  in  English,
33600	each  possessing  one  to  over  a  hundred  senses.   To construct a
33700	machine-usable word dictionary  of  this  magnitude  is  out  of  the
33800	question  at this time.    Recognition of natural language input such
33900	as described above, allows real-time interaction in a dialogue  since
34000	it  avoids  becoming  ensnarled  in combinatorial disambiguations and
34100	long chains of inferencing which would slow a dialogue algorithm down
34200	to  impracticality,  if it could even function at all. The price paid
34300	for pattern-matching is that sometimes, but rarely, ambiguities  slip
34400	through.  Eventually  a rewrite interpreter must be added to pattern-
34500	matching rules to deal with ambiguities. (Enea and Colby,1973).
34600		A drawback to PARRY1 was that it reacted to the first pattern
34700	it  found  in the input rather than characterizing the input as fully
34800	as possible and then deciding what to do based on a number of  tests.
34900	Another   practical   difficulty  with  PARRY1  from  a  programmer's
35000	viewpoint, was that elements of  the  patterns  were  strung  out  in
35100	various  procedures  throughout  the  algorithm.    It  was  often  a
35200	considerable chore for the programmer to determine  whether  a  given
35300	pattern   was   present   and   precisely   where   it  was.  In  the
35400	above-described method, the patterns are all collected in one part of
35500	the data-base where they can easily be examined.
35600		Concentrating all the patterns in the data base gives  PARRY2
35700	a  limited  "learning"  ability.   When  an  input fails to match any
35800	stored pattern or matches an incorrect one,  as  judged  by  a  human
35900	operator,  a  pattern  which  matches  the  input can be put into the
36000	data-base automatically.  If the new pattern has the same meaning  as
36100	a previously stored pattern, the human operator must provide the name
36200	of the appropriate response function.  If  he  doesn't  remember  the
36300	name,  he  may  try  to  rephrase the input in a form recognizable to
36400	PARRY2 and it will name the response  function  associated  with  the
36500	rephrasing.  These mechanisms are not "learning" in the commonly used
36600	sense but they do allow a  person  to  transfer  his  knowledge  into
36700	PARRY2's data-base with very little redundant effort.
36800		Informal  observation  thus  far  shows  PARRY2's  linguistic
36900	recognition  abilities  to  be  quite  superior  to  PARRY1's. A more
37000	systematic and quantitative evaluation of performance will be carried
37100	out in the future.
37200	
37300	
37400	                      REFERENCES
37500	
37600	Colby, K.M., Weber, S., and Hilf, F.D. (1971). Artificial Paranoia.
37700		ARTIFICIAL INTELLIGENCE, 2, 1-25.
37800	
37900	Enea, H. and Colby, K.M. (1973). Idiolectic Language-analysis
38000	       For Understanding Doctor-patient Dialogues. THIRD
38100	       INTERNATIONAL JOINT CONFERENCE IN ARTIFICIAL INTELLIGENCE.
38200	       278-284.
38300	
38400	Wilks, Y. (1973). An artificial intelligence approach to machine
38500		translation. In COMPUTER MODELS OF THOUGHT AND 
38600		LANGUAGE, Schank, R. and Colby, K.M., Eds., W.H. Freeman,
38700		San Francisco.